Search CORE

45 research outputs found

Handwriting Recognition of Historical Documents with few labeled data

Author: Chammas Edgard
Likforman-Sulem Laurence
Mokbel Chafic
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/11/2018
Field of study

Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated textlines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10% of manually labeled text-line data from a dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multiscale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset. Our system achieved the second best result during the ICDAR2017 competition

arXiv.org e-Print Archive

Crossref

Synchronous Alignment

Author: Mariéthoz Johnny
Mokbel Chafic
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

In speaker verification, the maximum Likelihood between criterion is generally used to verify the claimed identity. This is done using two independent models, i.e. a Client model and a World model. It may be interesting to make both models share the same topology, which represent the phonetic underlying structure, and then to consider two different output distributions corresponding to the Client/World hypotheses. Based on this idea, a decoding algorithm and the corresponding training algorithm were derived. The first experiments show, on a significant telephone database, a small improvement with respect to the reference system, we can conclude that at least synchronous alignment provides equivalent results to the reference system with a reduced complexity decoding algorithm. Other important perspectives can be derived

Infoscience - École polytechnique fédérale de Lausanne

Latent Semantic Indexing by Self-Organizing Map

Author: Kurimo Mikko
Mokbel Chafic
Publication venue: Cambridge, UK
Publication date: 10/03/2006
Field of study

An important problem for the information retrieval from spoken documents is how to extract those relevant documents which are poorly decoded by the speech recognizer. In this paper we propose a stochastic index for the documents based on the Latent Semantic Analysis (LSA) of the decoded document contents. The original LSA approach uses Singular Value Decomposition to reduce the dimensionality of the documents. As an alternative, we propose a computationally more feasible solution using Random Mapping (RM) and Self-Organizing Maps (SOM). The motivation for clustering the documents by SOM is to reduce the effect of recognition errors and to extract new characteristic index terms. Experimental indexing results are presented using relevance judgments for the retrieval results of test queries and using a document perplexity defined in this paper to measure the power of the index models

Infoscience - École polytechnique fédérale de Lausanne

Direction of Arrival Estimation using EM-ESPRIT with nonuniform arrays

Author: El Kassis Carine
Fleury Gilles
Mokbel Chafic
Picheral José
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2012
Field of study

International audienceAbstract This paper deals with the problem of the Direction Of Arrival (DOA) estimation with nonuniform linear arrays. The proposed method is based on the Expectation Maximization method where ESPRIT is used in the maximization step. The key idea is to iteratively interpolate the data to a virtual uniform linear array in order to apply ESPRIT to estimate the DOA. The iterative approach allows to improve the interpolation using the previously estimated DOA. One of this method novelties lies in its capacity of dealing with any nonuniform array geometry. This technique manifests significant performance and computational advantages over previous algorithms such as Spectral MUSIC, EM-IQML and the method based on manifold separation technique. EM-ESPRIT is shown to be more robust to additive noise. Furthermore, EM-ESPRIT fully exploits the advantages of using a nonuniform array over a uniform array: simulations show that for the same aperture and with less number of sensors, the nonuniform array presents almost identical performance as the equivalent uniform array

HAL-CentraleSupelec

HAL-Rennes 1

Brain Imaging and Machine Learning for Brain-Computer Interface

Author: Chafic Mokbel
Gerard Chollet
Maha Khachab
Nicolas Saliba
Salim Kaakour
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Crossref

CLIENT / WORLD MODEL SYNCHRONOUS ALIGNEMENT FOR SPEAKER VERIFICATION

Author: Bimbot Frédéric
Genoud Dominique
Mariéthoz Johnny
Mokbel Chafic
Publication venue: Budapest, Hungary
Publication date: 10/03/2006
Field of study

In speaker verification, two independent stochastic models, i.e. a client model and a non-client (world) model, are generally used to verify the claimed identity using a likelihood ratio score. This paper investigates a variant of this approach based on a common hidden process for both models. In this framework, both models share the same topology, which is conditioned by the underlying phonetic structure of the utterance. Then, two different output distributions are defined corresponding to the client vs. world hypotheses. Based on this idea, a synchronous decoding algorithm and the corresponding training algorithm are derived. Our first experiments on the SESP telephone database indicate a slight improvement with respect to a baseline system using independent alignments. Moreover, synchronous alignment offers a reduced complexity during the decoding process. Interesting perspectives can be expected. Keywords : Stochastic Modeling, HMM, Synchronous Alignment, EM algorith

Infoscience - École polytechnique fédérale de Lausanne

Behavior of a Bayesian adaptation method for incremental enrollment in speaker verification

Author: Bimbot Frédéric
Fredouille Corinne
Hennebert Jean
Jaboulet Cédric
Mariéthoz Johnny
Mokbel Chafic
Publication venue: Istanbul, Turkey
Publication date: 10/03/2006
Field of study

Classical adaptation approaches are generally used for speaker or environment adaptation of speech recognition systems. In this paper, we use such techniques for the incremental training of client models in a speaker verification system. The initial model is trained on a very limited amount of data and then progressively updated with access data, using a segmental-EM procedure. In supervised mode (i.e. when access utterances are certified), the incremental approach yields equivalent performance to the batch one. We also investigate on the impact of various scenarios of impostor attacks during the incremental enrollment phase. All results are obtained with the Picassoft platform - the state-of-the-art speaker verification system developed in the PICASSO project

Infoscience - École polytechnique fédérale de Lausanne

Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

Author: Chammas Edgard
Cirstea Bogdan-Ionut
Granell Emilio
Likforman-Sulem Laurence
Martínez-Hinarejos Carlos-D.
Mokbel Chafic
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

[EN] The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR) has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV) words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs). They show that sub-lexical units outperform word units in terms of Word Error Rate (WER), Character Error Rate (CER) and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs) and Convolutional Recurrent Neural Nets (CRNNs). Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition.Work partially supported by projects READ: Recognition and Enrichment of Archival Documents - 674943 (European Union's H2020) and CoMUN-HaT: Context, Multimodality and User Collaboration in Handwritten Text Processing - TIN2015-70924-C2-1-R (MINECO/FEDER), and a DGA-MRIS (Direction Generale de l'Armement - Mission pour la Recherche et l'Innovation Scientifique) scholarship.Granell, E.; Chammas, E.; Likforman-Sulem, L.; Martínez-Hinarejos, C.; Mokbel, C.; Cirstea, B. (2018). Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks. Journal of imaging. 4(1). https://doi.org/10.3390/jimaging4010015S154

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

HAL Descartes

RiuNet

Incremental Enrollment of Speech Recognizers

Author: Collin Olivier
Mokbel Chafic
Publication venue
Publication date: 10/03/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Towards Introducing Long-Term Statistics in MUSE for Robust Speech Recognition

Author: Chafic Mokbel
Christopher Kermorvant
Publication venue
Publication date: 10/03/2006
Field of study

In this paper, we propose new developments of the MUltipath Stochastic Equalization techniques (MUSE). The MUSE technique is based on an enriched model of speech, composed of both a classical model of clean speech with HMM and equalization functions. This technique is able to reduce the recognition error rate due to a mismatch between the training and testing conditions. In order to track long-term variation of this mismatch, the introduction of a priori statistics on the equalization function is studied. In the case of Bias Removal, this approach has been implemented in HTK and tested on the Numbers95 database. Experiments show that the convergence of the bias computation is fast enough and limits the effect of the a priori values. However, both the fast convergence property and the proposed framework open research directions towards more complex equalization functions

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX